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ABSTEACT 

Because it is a true score model employing item 
parameters which are independent of the examined sample, item 
characteristic curve theory (ICC) offers several advantages over 
.classical measurement theory* In this paper an approach to biased 
item identification using ICC theory is described and applied* The 
ICC theory approach is attractive in that it, (1) appears to be 
sensitive largely to cultural variations in the trait gauged by tfst 
items, (2). does not assume total scores to be valid iiidicators of 
true ability, (3) places the identified degree of item bias on a 
guantified metric, and (4) is applicable to items of sufficiently 
varying degrees of difficulty. Hhile sensitive to some factors other 
than item bias, namely, local independence, item inappropriateness 
and poor parameter estimates f the approach may prove useful to the 
measurement field. (Author/BC) 
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Over the past several years, the issue of bias in intelligence 
testing^ achievement testings and tes^ting for selection and placement 
has been of increasing concern to both the layperson and the measure- 
ment expert. In response to this concern, various models have been 
proposad for evaluating bias both in a measure as a whole and in the 
items within a measure, Models for evaluating bias in measure as a . 
whole (see the Spring 1976 issue of the Jouiiial of Educational Measure^ 
ment) are of primary interest to the test user as they assist in the 
fair use of test results. Models for evaluating the items within a 
measure (see the reviews by Merz^ 1976 and by Rudner^ 1977a) are of 
prime interest to the test developer. These approaches have the poten- 
tial to assist in developing valid and cross-^culturally fair test items. 
This paper is addressed to an improved method for analysing item bias. 

SOME CRITERIA FOR AN IMPROVED APPROACH 

In their reviews of the literature, Merz and Rudner discussed 
several of the approaches to biased item identification along a variety 
of dimensions. Although the intent of these discussions was to Identify 
relative merits and weaknesses, some of the dimensions can be used to 
establish criteria for an improved approach. The following criteria and 
rationales are proposed. 



The author is indebted to David Knight for his valued input on earlier 
drafts, of this report and to the Office of Demographic Studies at Gallaudet 
College and an anonymous West Coast school district for providing the data 
used in the study. 
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An improved approach to biased item identification should t 

be sensitive only to jroup differences in the factor gauged by 
the item ' 

Item bias is concerned with whether an item measures the same trait 
across populations. An improved approach should identify only Items 
which fail to do this and not be overly sensitive to factors other 
than biass e.g. group differenees in ability^ sampling, and item 
inappropriateness . 

not assume total scores to be valid indicators of ability 

Total observed scores are obtained by summing item responses. Con- 
sequently, the presence of biased items causes one to suspect that 
the total scores contain additional error* An approach relying on 
this assumption could yield spurious results* 

quantify degree of item bias 

While it is convenient to refer to an item as being biased or un- 
biased, this dichotomous distinction can be infexible as well as 
misleading. The investigator needs to be able to vary the defini- 
tion of what is */very biased^' to suit the purposes of the study - 
An improved approach must at least have this flexibility and pre-- 
ferably map_the d^^ ltem_bias to a meaningful scale. 

be appiicable to items of varying difficulty 

Some of the previously propDsed approaches are limited in their 
ability to detect item bias in easy or difficult items, While no^^^w^. 
approach can be expected to detect item bias when almost all or 
none of the examinees respond correct lyV an improved approach should,, 
at least, be applicable ovei* a wide range of item p-values. \ ; ; 



This paper dasearibes an approach which capitali2es on item character- 
Istic curve (ice) thtory and employs a definition g£ bias simlliaip to that 
used by Green Draper Scheuneman C1976)|, aiid Pine and Weiss 

(1976), Thiii ice theory approach appears attractive when measured against 
the above criteria. 

■ ■ ■ ■ ■ ■ ■ ■ • . . ; , ■ - 

A BRIEF OVERVIEW OF ICC THEORY / 

Latent trait ©r item characteristic curve theory ralates the proba- 
bility of a correct itam response to a function of an exmninea's under- 
lying ability level C©i) and characteristic (s) of the item* While the 
various models CLord, 1952i Rasch^ 1960r Bimbami, 196SV Urry, 1970) 
differ in terms of the number of item parameters considered| they all 
describe the item parameter (s) independently of the examined sample. 
This attractive property has lead to th© developmeht of some interesting 
applications in test development^ adaptive testing and equating and may 
prove useful in detecting item bias. 

One general, cumulative logistic model formalized by Bimbaum uses 
three item parameters: a^ - an item discrimination index, bg - an item 
difficulty index , and Cg - a pseudo guesiiing parget er. Using the nota- 
tion PCUg=l |0j^3 to represent the probability of a correct response to 
itom g given an examinee of ability level 0j^, Birnbaum's three parameter 
model states thati 

PCUg=i|0i) = Cg Cl> Cg) [l^expC -1.7a. (©i bg) )] "1 / 

, .^./^ - • ■■ 

This relationship between Bi and P(Ug^l|0£3 is illustrated iri^'Figure 1* 



insert Figure ! about _ here ' 
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Figure 1 : A ^hypothoticul item characteristic curve 



The probEbility of a correct response given a specific aMlit^ 
increaies monotonically as true ability increases. For exaniplei m 
examinee with a high true ability, e.g, has a high probability of . 
responding correctly [PCUg=l l3j)^l*0] . Converselyj an examinee of low 
true abilitys e.g* 0^/ has a low probability of responding correctly; 
approaching the lower as>Tiiptote of the curve^ Cg, 

The inflection point of the curves bg/ is referred to as the item 
difficulty parameter in that it indicates the relative position of the 
curve along the © axis. The more the curve is positioned to the rights 
the more ability is necessary for an examinee to have a good probability 
of a corract response, The slope of the curve at bg helps define a 
third parameter, ag. This value/ referred to as the discrimination 
parameter, indicates the power of the item to separate examinees of close 
but unequal levels of ability. Although the item parameters and Q are on 
a common metric, these item. parameters describe chara of the 

item independently of the examinee group. Full explanations and develop- 
ment of this and other mental measurement models can be found in Jensema 
(1972) and in Lord and Novick (1974], 

ICC THEORY AND BIASED ITE?^ IDENTIFICATION 

The only previous applications of ice theory for identifying biased 
items found in the literature were those of Green and Draper , Wright 
et.al, C19763 and Lord (in press) , Green and Draper had used observed 
total scores as estimates of examinees' abilities, ©i's, and the pro= 
portions of examinees responding correctly at each total score level as 
estimatos of PCUgsl*"|0i) , Their procedure called for plotting estimated 
ice's for each item separately for each. culture' group and comparing the 



By this and other latent trait theory approaches, an item is unbiased 
if examinaes of the same ability level, but of different cultural affili- 
ations, have equal probabilities of responding correctly* That is, an 

'- - '^ ..... - . . : - ' ■ ^- 

item is unbiased if the estimated ice »s obtained from the various culture 
groups are identical. As an example of a biased item, consider the two 
hypothetical curves shown in Figure 2, These curves are based on re- 
sponses by two different culture groups to the same itete. Total observed 
scores are used as estimates at ©i and proportions of examinees responding 
correctly are used as estimates of PCug=lj0i). The curves are not identi- 
cal, since the location parameters for the two curves are not equal. Such 
an item can be considered biased in that often examinees of the same ability 
level, e^g, x ^ ^ 58%, but . from different culture groups^ do not hav 
similiar proportions of correct responses* ^ 



insert Figure 2 about here 



While this approach is appealing, it fails to meet the second and 
third criterion* The approach as used by Grean and Draper direct ly in- 
corporates total observed scores and quantification of the degree of 
item bias is difficult (an eyeballing procedure Is used to identify a 
"very biased item'^). 



'In a recent Monte-Carlo investigation of test bias modelSi Pihf and 
Weiss (1976) used a similiar operational definition of bias. Specifi- 
cally they maintained equal Bimbaum ag parameter values between groups 
and varied the bg parameter values to vary the^amount of bias* 

5"" 



Rather than using total obsarvad scores as astimates of 0j and pro- 
portions as estimates for PCUg=l|0i), more accurate values can be obtained 
using one o£ the recent methods of parametariEation: (Urry, 1975] W 
and I^rdj 1973), During parajneterization, the metric used for the 0 scale 
is defined by the ability variance in the examined sample. In order to 
compare paramete rs J 3b tain ed from two different examinee groups, the ob- 
tained values must be equated. Lord mid Novick C1974, Chapter 16.11) and 
Rudner C1977b) have shown that this can be accomplished by computing the 
regressions of the parameter values based on one group of e^caminaes on the 
parameter values based on the other group of examinees* The equated ice's 
will be identical when the restrictions of the model are met. That is, 
when the measure i 

(1) is unidimensional . 

(2) contains locally independent items 

(3) has error free parameter estimates 

Since test itemss by design, are usually locally independent and since 
accurate parameter estimates can usually be obtained, non-identical equated 
ice's would be largely indicative of non-unidimensionality. That is, aber- 
rant equated ice's would indicate that the item measures (a) different 
traits across cultures (bias) or (b) a trait other than that gauged by 
the other items Cinappropriateness) 

One could evaluate the residuals from the regressions-to gauge the ex- 
tent of item bias. Rather, than using residuals of the form Y-Y, which 
cannot be readily compared between applications ^ perpendicular item - re- 
gression line distances could be used* Such a method would similiar to 
that suggested by Angoff (1972) for use with a regression-like technique 
whereby transformed p-values are examined for differences between groups 

■ . . .. - ...... , ■ ■ - a . ■ . 

_^ , ■ , .. .^..^j^^^^ . .. .. - . .. ■■ ^.^^^^^ ^ ... ^ .- . . - - - 

An alternate approach is to refine the procedura used by Green and 



Draper and use the equated paraitieter values to plot ice curves for 
item by each culture group. The resultant scale would not depsndent upon 
potentially biased total scores and PCug*l|ei) would be more accurately 
represented. However, eyeballing would still be necsisary and the third 
criterion requiring quantification would not be met. 

Wright, Mead and Draba C1976X have described an approach using the 
Rasch one parameter (bg) model whereby goodness of fit residuals are 
examined for between - group differences, A more attractive approach 
was used by Lord [in press) who tested for a significant difference 
between equated ice's. An as>Tnptotic significance test based on the 
summed variance - covariance matricas of the ag and bg parameter estimates 
was employed, " 

The approach preferred by author is to co.npute the area between the 
two equated ice's. This value would be low for relatively unbiased items 
arid high for relatively biased ones. In most instances this area, defined 
by: 

where PCugi=lj8i) and P' CUg=l|0i define 
the equated ice Vs for the two groups, ' 
can be readily approximated on a high speed computer by: 

s.ooo- 

, -5.000 ^ -S. : ' : . : . 

. where A0= .005 

This method places bias on a ratio scale and overcomes the problems of eye- 
balling and simultaneously analyzing differences in item discrimination and 



TWO APPLICATIONS OF THE ICC THEORY APPROACH 

The ice theory approach to biased item identification is illustrated 
£01^ two different situations. The first illustrates the approach when 
there are no biased items in the item pool. The second represents the 
approach as it might be used in test development. In the first situationj 
examinees from one culture group were randomly divided into two groups of 
different mean ability. Thus, two groups o£ the same cultural affiliation 
but different levels of ability were formed* Treating these groups as 
though they represented different cultures and applying the ice approach 
resulted in a pseudo- culture group comparison similiar to that employed 
by Jensen (1973) in evaluating an analysis of variance approach to biased 
item identification. Since both groups are of the same cultural afflli- 
ationi item aberrance should be minimal. 

Item Pools - The 1973 Stanford Achievement Test, Form A/ Primary 2 Battery^ 
Reading Comprehension Subtest (SAT), which, item for item, is equivalent 
tp^ the Stanford Achievernent Test - Hearing Impaired Version^ Level 2, 
Reading Comprehension Subtest formed the initial item pool for use in 
this study. The SAT consists of 16 paragraphs with a total of 48 four- 
choice items* According to the test publishers/ emphasis is placed on com- 
prehending disconnected discourse* It was anticipated that the SAT would 
contain several itemf- biased in favor of one of the sampled culture groups. 
Subjects - The study incorpprates item responses made by large samples of 
examinees from two diverse culture groups.; The first is- copposed of students 
in United States programs for the hearing impaired* . The second is represent 
tative of the population for which the SAT was designed; namely normal 
hearing students in public schools. One major difference in these groups 



is their exposure to and ability to use the Biglish language (see Stokoe, 
1976 for an excellent discussion of the social and cultural- aspects of 

the deaf community). 

In^-ySf part of the ^nual Survny of Hearing Impaired Children 
and Youth, the Office of Demographic Studies at Gallaudet College collect- 
ed item responses to the entire Stanford Achievement Tost. ^ Hearing 
Impaired Version; From their national random sample of 6*182 hearing 
impaired students^ the sample of 2,637 examinees taking t^ 
-battery was extracted* ^^^^^ 

One thousand/ six hundred three Clf6033 students enrolled in a large^^ 
Wast Coast public school district taking the SAT in the Spring of 1976 
composed the sample of examinees representative of the population for 
which the measure was developed. 
Procedures 

The steps involved in applying the ice theory approach are r 

1. Parameterize on each group separately Carry's iteirative 
minimum chi square technique was used] '-nu-n.., 

2. Equate the scales by ' 

(a) regressing the ag parameters obtained for the first - 

group, through the origin, "on the Eg parameters" obtained 

for the second group, and 
: (b) regressing the bg parameters obtained for the first group 

^ - - on those obtained for the second^ 



^The magnitude of the inversely reflects the aggregate amount of 
aberrance. ^ When the r2 is low and hence many aberrant items are present, 
it is wise to trim items and recompute the regressions. This will prevent 
extremely biased items from overly distorting the regression equations 
used to equate the ice's. : ■ . ... r,. ■ .... .^^ . , =7'^ ■ ; 



3. The indicator of the degree of bias for each item g is the , 
area between the equated ice's which is approximated by 

■ ^ s.ooo ' ■ ' ' ' 

-5.000/ :^ v:. 

- where A0=,OO5 

For the pseudo group comparison, the hearing impaired examinees 
were randomly divided into two groups with different mean observed 
scores. This was accomplished by specifying, a priori^ the desired 
observed score distributions of the two group of examinees* The 
resultant numbers of examinees were then converted to proportions of 
the total number of examinees needed for group assignment and propor- 
tions of examinees needed for each groupi For each axaminee, a random 
number was drawn and compared with the appropriate proportions to de- 
termine group assignment. A total of S28 examinees were lost due to 
the over abundance of examinees of certain observed score levels, 

A PSEUDO-CULTURE GROUP COMPARISON 

Summary statistics for the two pseudo^culture groups are shown 
in Table 1. The groups differed in mean observed scpres thus implying 
differances in group abilirtyT^ ^'^" \ 

insert Table 1 about here . 
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Table 1 

Test Statiitics for the Two Pseudo-Cultura 
Groups on the SAT - 

N X s,d. KR-20 ■ 

1079 23.7 7.43 ,83 ] 

1030 .20,9 .6,97 .31 ^ 



Group 1 
Group 2 



ERIC 



15 



14 



The equated and unequated parameter value estimates and the 
identified degrees of aberrance are shown in Table 2. For ease of 
interpretation the identified degrees of aberrance are plotted in 

Figures,...-' . ' .- ■ , -\ •^•■^^^ - ■-- 

insert Table 2 and Figure 3 about here 



The reader should note that with the exception of items 28 and 
39 Cand items 21 and 44 which had negative point biserials and could 
not be parameterized), all the identified degrees of aberrance are low, 
falling below ,4, This value can be viewed as representing measurement noise 
in the fonn of parameterization error and slight deviations from uni^, ' 
dimensionality and local independence, 

A closer examination of the more aberrant items provides some added 
insight. Items 28 and 39 were more aberrant because of local de= 
pendence, non-wlthin group unidimensionality or poor parameter estimates. 
The bg parameters for these items were extreme ly high for the second 
group of examinees, namely 2,77 and 3,91 respectively. This caji be 1 
interpreted as meaning that, ignoring guessing, an examinee^ s ability must 
be 2,77 (3- 91) standard deviations above the group mean ability to have 
a better than average chance of responding correctly. Since relatively 
-few examinees were of this ability level, parameteriEation became tenuous 
and it is felt that the slight aberrance of these items was due to abnor^ 
mally high parameterization error, 

A DIVERSE CULTURE GROUP CONiPARISON 

■ . . - • • • • ■ • ; ' j ■ . ■■ ■ ■ ■ ■ . . ■. . - 

Summary statistics for the two diverse culture groups are shown in 
Table 3. The equated and unequated parameter. value estimates and the 



Table 2 



Equated and Unequated Parameter Estimates. ^and 
Degrees af Aberrance for the Pseudo-Culture Group Comparison 



Item 
# 



Group 1 



Group 2 
(Equated) 



Group 2 
CUnequated) 



Abarrance 



























a 
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c 
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a 

( 




c 




1 


,82 


- .22 


.20 


.88 


- .07 


.20 


.93 


.26 


.27 


.12 


2 


1.26 


-1.42 


.31 


1.02 


-1,25 


.31 


'l.,08 


-1,10 


,28 


.15 


3 


*99 


-1.32 


.26 


1.03 


-1.18 


.26 


1.09 


-1,02 


.28 


.10 


4 


1,24 


,23 


.38 


1.24 


.14 


.38 


1.31 


,50 


,35 


.06 


S 


1.95 


-l;S4 


.37 


1.97 


-i.41 


.37 


2.08 


-1,28 


.39 


.08 


6 


.64 


,83 ' 


.20 


.68 


.47 


.20 


.72 


.88 


.19 


.28 


7 


1^*02 


.42 . 


.18 


1.21 


,71 


.18 


1.28 


1,16 


.31 


.24 


8 


m64 


.*93 


as 


.64 


,67 


.28 . 


.67 


1.11 


.25 


.19 


9 


.79 


.19 


.11 


.66 


.30 


.11 


,70 


,69 


.11 


.19 


10 


.96 


1.40 


.33 


■ "1.10 


1,33 


.33 


1.16 


1.87 


.38 


.08 


11 


V81 


.10 


.17 


1.03 


.16 


,17 


1.09 ■ 


,52 


.24 


.18 


12 


1.02 


- .12 


.14 


.91 


.07 


.14 


.96 


.42 


.19 


.17 


13 


1.42 


-1.10 


.29 


1.38 


-1,05 


,29 


1.45 


- ,87 


,32 


.04 


14 


1.09 


.66 


.33 


.78 


,55 


,33 - 


.82 


.97 


.24 


.21. 


IS 


1.03 


- .82 


.21 


1,00 


- .77 


,21 


1.05 


- .54. 


.24 


.04 


16 


1.10 


1.45 


.32 


.79 


1.28 


,32 


.83 


1.81 


,25 


,22 


17 


1.39 


1.S6 


.33 


.89 


1.95 


,33 


.94 


2.58 


.31 


.31 


18 


.91 


1.78 


.35 


.63 


1.60 


,35 


,66 


2^18 


.25 


.26 


19 


.62 


- .54 


.13 


.76 


- .22 


.13 


.80 


.09 


,15 


,32 


20 


1 . 01 


1,84 


.32 


,76 


2.14 


.32 


,80 


2.80 


,33 


.24 


22 


1.49 


-3.68^ 


,42 


1.42 


3.38 


.42 


1,50 , 


4.23 


,42 


.17 - 


21 


^83 


■ - .19 


.28 


.80 


- .67 


.28 


,84 


- .43 


.20 


,34 


24 


.75 


- ,60 


.15 


.94 


- .57 


■ .15' 


,99 


- .31 


/.22 


,19 


25 


.28 


1.91 


.30 


.24 


ti-2,79,,^ .30 


,25 


3.55/ 


' .30 


,36 


26 


.68 


.79 


.26 


.83 


1.00 


.26 


,88 


1.49 


.36 


,21 


27 


1.58 


2.04 


.27 


1.77 


1.89 


.27 


1.87 


2.-51 


.28 


.11 


28 


.70 


3.01 


.40 


.72 


2,11 


.40 


.76 


2.77 


.39 


.51 


29 


1 . 02 


,90 


.28 


.93 


.76 


.28 


,98 


1.21 


.24 


.11 


30 


1.11 


2.03 


.24 


1.13 


2.22 


.24 


1.19 


2.90 


.27 


. .14 


31 ^ 


1.16 


.97 


.27 


1.09 


1.09 


.27 


1.15 


i;59 


.31 


.09 


32 


.94 


.47 


.16 


,94 


.55 


.16 


.99 


/.97 


.25 


.07 


33 


.89 


1.85 


.28 


1.58 


1.53 


.28 


1.67 


2.10 


.37 


,34 


34 


1.80 


- .25 


.22 


1.53 


- .07 


.22 


1.61 


.26 


.29 


.14 


35 


1.37 


.46 


.21 


1.70 


.57 


.21 


1.79 


1.00 


.25 


,12 


36 


1.75 


- .14 


.17 


.1.54 


,12 


.17 


i-;62 


.48 


.27 


.22- 


37 


2,13 


.09 


.16 


1.88 


,15 


.16 


1.98 


.51 


.19 


.06 


38 


.97 


.07 


.38 


.98 


- .30 


.38 


1.03 


,00 


.19 


.23 


39 


2.19 


2,16 


.22 


.99 


3,10 


.22 


1.04 


3.91 


.34 


.74 


40 ^ 


.60 


.29 


.36 


.70 


- .30 


.36 


.74 


- .01 


.15 


.38 


41 


.92 


1.92 


.24 




2.37 


.24 


.75 


3.07 


.32 


.35 


42 


1.11 


2.70 


.sr 


1.^01 


3.26 


.31 


1 , 06 


4.09 


.39 


.37 


43 


1.13 


3.12 


.32 


1.25 


2.68 


.32 ^ 


1,32 


3,42 


.34 


.29 


45 


1.0^3 


2.32 


.27 


1.23 


1.85 


.27 


1.30 


2,47 


.26 


.34 


46 


* ' f 


.65 


.15 


1,00 


.70 


.15 


1.05 


1.14 


.21 


.14 


47 


.51 


1.73 


.21 


.66 


1.41 . 


•-^21 


.70 


1.96 


.24 


.32 


48 


.80 


.25 


.11 


1.06 


.44 


,11 


1.12 


.85 


.20 


.26 



mm 





2.5- 


+ 






I 






I 






I 






I 




2.0- 


+ 






I 






I 






I 


A 




I 


b 


1.5- 


+ 


e 




I 


r 




I 


r 




I 


a- 




I 


n 


1.0- 


+ 


c 




I 






I 






I 






I 




-.5- 


+ ~ 






I 






I 






r 






I . 






I . 



■ •- ■ • 

» « • ■ ■ I 

9 • 9 9 9 9 



'9 ■ 9 ■ « 
B ■ ■ * 



• ^ m 

■ 9 - *■■ 9 

9 9 9 9- 

9-9 9 9 . 



5 



10 15 20 . 2-S 
■ :It em Number 



30 



. s * ■ 

. + ^ ^ ■ 

,15 



.« • • m i : - 9 9 ■» .9 
9-9 9 9 - m ' ■ 9 B 9-9 

40 45 



Figure^ 3 : Plot of the degrees o£ item aberranee tdantifled 
in the pseudo-^cultur© group comparison* ■ v 
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identified degrees of aberrance are shown iii Table 4* In Figure 4, the 
identi s of aberrarce are plotted- 

' insert" Table 3, Table 4 and Figure 4 about here 



The test developer can use the identified degrees af aberrance to ^^^^^^ v ^ 
dat ermine which items to consider as biased. If one wishes to screen j 
items during test developmenty: then a liberal definition of a very biased 
item, e*g. <^g>.4 , could be used,. (In test development it usually would ^^r'^^r-^.^ 
be wise to liberally reject items suspected as being biased,) On the 
other hand i if one wishes to identify sali^t characteristics of very 
biased items, a more conservative value, e*g, ^g>, 7^ could be used. ^ 

In an exploratory study in corporating these two populations and 
using togoff's transformed p»value regression technique, Rudner CI977c) 
pooled items identified as very biased from 13 measures- and- found^six 
English constructions causing undo difficulty for hearing impaired ex- 
y aminees. Items 4, 16, 17^ 22 and 25^ which had\$g;s >* 5 and which were ' 
biased in favor of hearing examinees, fell into one of these six categories. ^^^^^^^^^^^ 

^ ^ ^ EYEBALLING ;! 

One could use the equated parameter^values to identify biased items ■ :\ 
in a manner analagous to the procedure used by Green and Draper as illu- 
strated in Figures 5 and 6* In Figure^ (representing a relatively un- - r ^ 
biased item), the ice's obtained for both groups are qui wcsimiliar. 
Examinees of the same true ability but different cultural affiliations -yM 
have similar probabilities of responding correctly. The ice's sho^m in 
Figure 6, however, are quite different. Examinees of the same latent ._-:.^^:y^ 
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Table 3 



Test Statistics for the Two Diverse -Culture 
Groups on the SAT 



Hearing Examinees 

Hearing Impaired 
Examinees 



1603 

2637 



X 
28.9 

21.6 



s.d. 
12.44 

7.42 



KR-20 
.95 

. .33 



1 



m 

- i 



20 



m 




Table 4 



Equated and Unequated Parameter Estimates and 
Degrees of Aberr^ce for the Diverse-Culture Group Comparison 



Item 



Hearing Impaired 



Hearing 
CBquated) 



Hearing 
(Unequated) 



Aberrance 



1 1.01 

2 1.42 

3 1.64 

4 1,44 

5 1.48 

6 .86 

7 1.36 

8 1.27 

9 1.69 
10 1.45 

. 11 1.62 

12 1.66 

13 1.72. 

14 1.16 

15 1.84 

16 1.91 

17 1.21 

18 1.00 

19 1.67 

20 1.3S 

22 .28 

23 .76 

24 .59 

25 2.29 

26 2.40 

27 2.42 
.28 2.14 

29 2.64 

30 2.44 

31 2.12 

32 1.38 

33 1.29 
■ 34 2.07 

35 2.20- 

36 2.79 

37 ! 2.15 

38 1.S6 

39 1.57 

40 .96 

41 1.67 

42 1.50 

43 1.32 
.45 1.72 

46 1.54 

.47 1.63 

-ma :i 2.07 



- .83 .08 1.18 

- .83 .07 1.59 
-1.09 .12 1.44 
-1.17 .15 1.71 
-1.14 .23 2.42 

- ,22 .14 ,-1.03 

- .32 .14 1.62 

- .59 .05 1.05 

- .30 .02 1.09 

- .20 .13 1.69 
,08 .12 1.22 

. .05 .12 1.31 

- .62 .12 1.87 
.05 .23 1.34 

- .36 .02 1.43 

- .57 .07 1.40 

- .66 .06 1.51 
1.22 .23 1.07 

- .19 .13 .95 
.42 .25 1.28 

- .34 .00 2;2S 

- .36 .03 1.14 

- .62 .04 1.11 

- ,17' .09 .58 
.22- .08 1.18 
.15 .17 2.66 
.45 .23 1.02 
.10 .01 1.46 

- .02 .14 1.72 

- .02 .03 1.60 
.08 .19 1.36 
.29 .10 1.60 
.53 .05 2.16 

- .33 .13 1.99 

- .35 ,00 2.15 

- .35 ^.08 2.62 

- ,23 ,11- - 1.38 
.44 .08 2.69 

- ,40 ,04 .89 
.62 .20 1,19 
.78 .14 1.63 
,51 .11 1,51 

1.07 .31 1.44 

,12 .08 1.19 

.58 .24 .69. 

,00 .04 ... l; 19.;: 



- .40 

- .78 

- .76 

- ;29 

- .86 

- .08 

- .16 

- .07 

- .22 
.08 

- .30 

- .37 

- .74 

- .15 

- .61 
.04 
.15 

, .14 

- .45 
.20 
.61 

- .55 

- .55 
.33 

- .03 
.18 
.38 

- .08 
.23 

- .03 

- .16 
.19 

- .37 

- .16 
~ .35 

- .30 

- .38 
.28 
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.31 
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Figure 4 : Plot of ..the degrees ■ of item - aberrance identified 
in the diverse-culture group comparison, v 




ability but from different culture groups usually have different probabili- 
ties of responding correctly* Thus^ the Item ii relatively biased. 



insert Figure S and Figure 6 about here 



Eyeballing the ice's in this manner allows^the researcher to gat a 
feel for the bias. Compare Figure 7 with that of Figure 6. Both items 
have the approximately the same amount of bias. The item shoim in Figure 
6 is biased over a broad range of examinee abilitiesj while the item 
shown in Figure 7 is very biased over a narrower range. Further j eye- 
balling clearly illustrates which group is- favored -by the item* Item 18 
favors hearing impaired examinees and item 17 favors hearing examinnes. 
Thus 5 eyeball ing offers advantages that the single numeric used to 
quantify bias does not. 



insert Figure 7 about here 



DISCUSSION 

The reader may have noted some of the following possible abjections 
to the ice theory approach" 

Ip aberrance may be indicative of things other than item bias 
^ 2T^--4-iTeeM^na^±t^L_a£_iia^ ^ - 

3. the approach is not applicable to items with extreme p-values 

4. not all items fit the latent trait theory model 

^ The first objection items from the fact that the approach identifies 
items=which are biased^ are locally dependent^ measure a trait other than 
that measured by. the _ other items and/or have poor parameter estimates . i: _ : ; 



Since^ in developing a measura, one would want to eliminate an item with 
any of the first three of these characteristics and since good parameter 
estimates are usually obtainable (at least whan -2> bg >2) this limitation 
while existing, is felt to be relatively minor/ ' 
Even though the ice theory approach does not identify the direct-- 
ionality of biaSj directionality can often be detemiined. When examinees 
from one culutura group consistantly have higher probE.bilities of respond- 
ing correctly to a particulatj^tem^ the item can be Sfiid to favor that 
group* This can be readily seen by comparing the equated bg paranater 
estimates or betteri by eyebal ling the equated Ice's. However i the reader 
should be aware that bias is not always directional. Consider the icc's^ 
shown in Figura 8. Low ability hearing examinees and high ability hearing 
impaired examinees are favored. Overall one can not say the item favors 
aiiy one^^^^ a fair amount of bias is present C%^-61)- 

ThuSj direct ionality is not always definabla/ nor should it be*; ^ 

insert Figure 8 about here 

In the pseudo-culture group two items were falsely identified as 
containing fair Miounts of bias. Item 28 had a ^g^.Sl and item 39 had 
a^gs, 74. Closer examination of these items revealed that their item 
difficulties were extreme. This illustrates that the ice theory approach, 
like many of the other approaches for biased item identification, is not 
always app liable to items with extreme p»values, ; iln addition i not all. 
items can fit the Birnbaum li tent trait model. Items 21 and 44 in both 
comparisons could not be parameterized because /of near zero or negative 
item - test point-biserial correlations. This in^-'lcates that ability was 



was poorly related or negatively related to the probability o£ a correct 
response. Since such items are the first to be eliminated in test develop- 
ment, the inability to parameterize all items does not seriously effect 
the utility of the approach. 

Although these limitations are" present , the ice theory approach : 
appears to have several attractive properties. Most importantly, the 
approach utilizes a true score model- thereby lifting the tenuous as- 
sumption that observed scores are valid indicators of true ability. This 
was established as a criterion for an improved approach since violations 
of this assumption can yield spurious results. 

Secondly, the approach appears to be sensitive to item bias. 
This contention was supported empirically in that the pseudo group com- 
parison yielded few aberrant items and in that the actual application had 
identified items whose formats had previously been classified as suspect. 

Third, the approach places each item on a metric to identify degree 
of item bias. This allows the test developer to evaluate an index of bias 
along with traditional Indices, such as, item difficulties (e.g. p-values) , 
discrimination indices (e.g. point biserial correlations) and dimension- 
.ality (e.g. factor loadings) , to determine which items to retain for a 
final item pool. 

Lastly, the approach is applicable to items of varying difficulty, 
as long as the bg parameters are not overly extreme. Thus, the approach 
can be applied to most norm referenced type measures, 
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SU^^MARY 



Because it is a true score model employing itgm parameters which 
are independent of the examined sMiple, item characteristic curve theory 
offers several advantages over classical measurement theory. In this 
paper an approach to biased item identification using ice theory was 
described and applied. 

The ice theory approach is attractive in that it, (1) appears to 
be sensitive largely to cultural variations in the trait gauged by test 
items, (2) does not assume total scores to be valid indicators of true 
ability, (3) places the identified degree of item bias on a quMtified 
metric, and [4) is applicable to items of sufficently varying degrees 
of difficulty. While sensitive to some factors other than item bias, 
namely, local independence, item inapproprlateness and poor parameter 
estimates i the approach may prove useful to the measurement field. 
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